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A Game-theoretic Formulation of the Homogeneous 
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Abstract — In this paper we formulate the homogeneous 
two- and three-dimensional self-reconfiguration problem over 
discrete grids as a constrained potential game. We develop a 
game-theoretic learning algorithm based on the Metropolis- 
Hastings algorithm that solves the self-reconfiguration problem 
in a globally optimal fashion. Both a centralized and a fully 
distributed algorithm are presented and we show that the only 
stochastically stable state is the potential function maximizer, 
i.e. the desired target configuration. These algorithms compute 
transition probabilities in such a way that even though each 
agent acts in a self-interested way, the overall collective goal of 
self-reconfiguration is achieved. Simulation results confirm the 
feasibility of our approach and show convergence to desired 
target configurations. 


I. Introduction 

Self-reconfigurable systems are comprised of individual 
agents which are able to connect to and disconnect from one 
another to form larger functional structures. These individual 
agents or modules can have distinct capabilities, shapes, or 
sizes, in which case we call it a heterogeneous system (for 
example [8]). Alternatively, modules can be identical and 
interchangeable, which describes a homogeneous system (see 
[11]). In this paper, we will present algorithms that reconfig¬ 
ure homogeneous systems and treat self-reconfiguration as a 
two- and three-dimensional coverage problem. 

Self-reconfiguration is furthermore understood to solve the 
following problem. Given an initial geometric arrangement 
of cubes (called a configuration) Cj and a desired target 
configuration Ct, the solution to the self-reconfiguration 
problem is a sequence of primitive cube motions that re¬ 
shapes/reconfigures the initial into the target configuration 
(see Fig. 1). By configuration we mean a geometric arrange¬ 
ment of a set of agents. The problem setup is then the 
following. 

• The environment £ is a finite two- or three-dimensional 
discrete grid, i.e. £ C Z 2 or £ C Z 3 . 

• N agents P = {1,2, ...,7V} move in discrete steps 
through that grid. 

• Each agent has a restricted action set 7which contains 
only a subset of all its possible actions A{. 
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• An agent’s utility or reward Ui(a G A) is inversely 
proportional to the distance to the target configuration. 
Many approaches to self-reconfiguration have been presented 
in the literature, each with certain short-comings. There have 
been centralized solutions (both in planning and execution, 
e.g. [19]), distributed solutions that required a large amount 
of communication (see [8]) or precomputation (see [7], [17]), 
approaches that were either focused on locomotion (e.g. [5]) 
or functional target shape assemblies (e.g. [12]). Distributed 
approaches have often relied on precomputation of rulesets 
(see [17]), policies (e.g. [7]), or entire sets of paths/folding 
schemata of agents (e.g. [6]). 

In this paper, we present a fully decentralized approach 
to homogeneous self-reconfiguration for which no central 
decision maker is required. Currently, each agent needs to 
know the location and shape of the target configuration to 
compute the utility function of its actions and choose an 
action. However, no (in the two-dimensional case) or limited 
communication (in the three-dimensional case) is required to 
successfully complete a reconfiguration. 

The rest of this paper is organized as follows. Section 
0 discusses relevant related work. Section III presents the 
system setup and theoretical formulation of the problem. 
In Section [TV] we discuss the completeness of deterministic 
reconfiguration, which is used in Section [V] to prove the 
existence of a unique potential function maximizer. In Sec¬ 
tion 0 we decentralize the stochastic algorithm and present 
simulation results in Section IyIH Section rvTTTI concludes the 
paper. 


II. Related Work 

In this section we want to highlight decentralized ap¬ 
proaches in the literature that bear resemblance to methods 
presented in this paper. Especially relevant to the presented 
results in this paper are homogeneous self-reconfiguration 
approaches such as [5], and [11], which employ cellular 
automata and manually designed local rules to model the 
system. Similar work in [7] shows an approach for locomo¬ 
tion through self-reconfiguration represented as a Markov 
decision process with state-action pairs computed in a dis¬ 
tributed fashion. The presented algorithms are decentralized 
but only applicable to locomotion and not the assembly of 
arbitrary configurations. 

Precomputed rules are also used in [17], in which graph 
grammars for self-reconfiguration are automatically gener¬ 
ated. Similarly, in [9] a decentralized approach is presented 
based on graph grammatical rules, in which automatically 
generated graph grammars are used to assemble arbitrary 









Fig. 1: Example of self-reconfiguration sequence from a random 2D configuration (left) to another random 2D configuration 
(right). Approximately every 20 th time step is shown. 


acyclic target configurations. Whereas these approaches are 
able to assemble arbitrary target configurations, they rely on 
precomputing rulesets for every target configuration. 

The algorithms presented in this paper are inspired by 
the coverage control literature, specifically game-theoretic 
formulations such as [1], [13], and [21]. Both static sensor 
coverage as well as dynamic coverage control with mobile 
agents are discussed. Note, however, that agents in these 
papers are limited to movement in two dimensions and 
operate with a different motion model and most importantly 
different constraints. 

We address these problems with a decentralized algorithm 
that does not rely on precomputed rulesets, can be used for 
locomotion as well as the assembly of arbitrary two- and 
three-dimensional shapes, can handle changing environment 
conditions as well as changing target configurations, and is 
scalable to large number of modules due to its decentralized 
nature. 

III. Problem Formulation 

In this work, we represent agents as cubic modules that 
move through a discrete lattice or environment £ = 7L d 
in discrete steps Q Without loss of generality these cubes 
have unit dimension. Therefore, an agent’s current state 
or action ai (in a game theoretic sense) is an element 
di E 7L d . Note that an agent’s action is equivalent to its 
position in the lattice. Cubes can be thought of as geometric 
embeddings of the agents in our system. A collection of 
agents is furthermore called a configuration. Therefore, a 
configuration C composed of N agents is a subset of the 
representable space Z dN (see [18]). Moreover, we will deal 
with homogeneous configurations, in which all agents have 
the same properties and are completely interchangeable. 

1 Since we present two- and three-dimensional self-reconfiguration, 
throughout this paper, the dimensionality d will be d E (2, 3}. 


A. Motion Model 

In the sliding cube model (see [4], [5], [17], [18], [19]), a 
cube is able to perform two primitive motions - a sliding and 
a corner motion. In general, a motion specifies a translation 
along coordinate axes and is represented by an element m E 
7L d . A sliding motion is characterized by ||m s || Li = 1, i.e. 
m s j = 1 for one and only one coordinate i E 1,... d, which 
translates a cube along one coordinate axis. A corner motion 
on the other hand is defined by ||m c || L =2 such that m c ^ = 
1 for exactly two coordinates i E 1,... d, which translates a 
cube along two dimensions. 

B. Game-theoretic Formulation 

In this section we formulate the homogeneous self¬ 
reconfiguration as a potential game (see [16]), which is 
a game structure amenable to globally optimal solutions. 
Generally, a game is specified by a set of players i E 
P = {1,2,...,7V}, a set of actions Ai for each player, 
and a utility function Ui(a) = L^(a^a_i) for every player 
i. In this notation, a denotes a joint action profile a = 
(ai, a 2 ,..., o/v) of all N players, and a_* is used to denote 
the actions of all players other than agent i. 

In a constrained game, the actions of agents are constrained 
through their own and other agents’ actions. In other words, 
given an action set Ai for agent i, only a subset 7 Zi(a)) c Ai 
is available to agent i. A constrained potential game is 
furthermore defined as follows. 

Definition 1: A constrained exact potential game (see 
[21]) is a tuple G = ( V , A, {U »(.)} ieV , {Ri(.)} ie v, ®(A)), 
where 

• V = { 1,..., N} is the set of N players 

. A*= Ai x • • • x An is the product set of all agents’ 
action sets Ai 

• Ui : A -A M are the agents’ individual utility functions 

. Ri : A -)• 2 A ’ is a function that maps a joint action to 

a restricted action set for agent i 


Additionally, the agents’ utility functions are aligned with a 
global objective function or potential <f> : A M if for all 
agents i G V, all actions a\ G Rfia ), and actions of other 
agents a-i G Ylj^i *4? the following is true 

Uiia'i, a-i ) - Ui(ai, a_j) = $(a', a_») - $(aj, a_») 

The last condition of Def. [l] implies an alignment of 
agents’ individual incentives and the global goal. Therefore, 
under unilateral change (only agent i changes its action from 
di to a[) the change in utility for agent i is equivalent to the 
change in the global potential T>. This is a highly desirable 
property since the maximization of all agents’ individual 
utilities yields a maximum global potential. We can now 
formulate the self-reconfiguration problem in game theoretic 
terms and show that it is indeed a constrained potential game. 

Definition 2: Game theoretic self-reconfiguration can be 
formulated as a constrained potential game, where the indi¬ 
vidual components are defined as follows: 

• The set of players V = {1,2,..., N} is the set of all 
N agents in the configuration. 

• The action set of each agent Ai = is a set of discrete 
lattice positions (or a finite or infinite subset of Z d ). 

• The utility function of each agent is Ufia) = 
dist(Q 1 c T )+i ' ^ ere ’ Ct is the target configuration and 
dist (di,C T ) = m in aj eC T \\ a i ~ a j\\- 

• The restricted action sets Rfia G A) are computed 
according to Section [Iil-C| 

• The global potential T>(a G A) = y>(a). 

iev 

Note that the utility of an agent is independent of all other 
agents’ actions and depends exclusively on its distance to 
the target configuration. An agent’s action set, however, is 
constrained by its own as well as other agents’ actions. The 
goal of the game theoretic self-reconfiguration problem is to 
maximize the potential function, i.e. 

max$(a) = max > Ufa) 
aeA v ' aeA f-' v ' 
ieV 

This can be interpreted as a coverage problem where the 
goal of all agents is to cover all positions in the target 
configuration. Therefore maximizing the potential is equiv¬ 
alent to maximizing the number of agents that cover target 
positions ai G Ct . The following propositions shows that 
this formulation indeed yields a potential game. 

Proposition 1: The self-reconfiguration problem in Def. 
[2] c onstitutes a constrained potential game with <f>(a) = 

Z2 u i(a) and U^a) = digt(a< * Cr)+1 . 
iev 

Proof: Let the agents’ utility functions = 

— T>(a°,a_^), where a® denotes the null action 
of agent i, which is equivalent to removing agent i from 
the environment. Then an agent’s utility is its marginal 
contribution to coverage (Wonderful Life Utility, see [1]), or 
in other words, the grid cells covered exclusively by agent i. 
But since each agent covers exactly the grid cell it currently 


occupies, the following holds. 


Ui(ai, ci—i) 


Therefore, 


E U j (a) = U i (a) 

iev jev\{i} 




As we will see in Section VI this potential game structure 
allows us to derive a decentralized version of the presented 
learning algorithm. 


C. Action Set Computation 

A core component of constrained potential games is the 
computation of restricted action sets. Unlike previous work 
(see for example [14] and [21]), agents in our setup are 
constrained not just by their own actions, but also those of 
others. In this section we present methods for computing 
restricted action sets such that agents comply with motion 
constraints as well as constraints imposed by other agents. 

a) 2D reconfiguration: In the two-dimensional case 
agents are restricted to motions on the xy-plane. Unlike 
in previous work (see [17] and [18]) where we required a 
configuration to remain connected at all times, in this work, 
agents are allowed to disconnect from all (or a subset of) 
other agents. This approach enables agents to separate from 
and merge with other agents at a later time. To formalize 
this idea, we first review some graph theoretic concepts. 

Definition 3: Let G = (V, E) be the graph composed of 
N nodes with V = {t’l, ^ 2 , • • •, V/v}, where node Vi repre¬ 
sents agent or location i. Then G is called the connectivity 
graph of configuration C if E = V x V with e^- G E if 
II ai - aj\\ = 1. 

This definition implies that two nodes Vi , Vj in the connectiv¬ 
ity graph are adjacent, if agent or location i and j are located 
in neighboring grid cells. Note that a connectivity graph can 
be computed for any set of grid positions, whether these 
positions are occupied by agents or not. We furthermore use 
the notions of paths on graphs and graph connectivity in the 
usual graph theoretic sense. Note that G is not necessarily 
connected as (groups of) agents can split off. Therefore, G 
generally consists of connected components Ci such that 
G = {Ci, C 2 ,..., Cm}. Since the edge set E of G is time- 
varying, the number m of connected components changes 
with time as well. Based on the connectivity graph G and 
the current joint action, we now define the function Ri : 
A —>• which maps from the full joint action set to a 
restricted action set for agent i and is based on the following 
two definitions of sets of primitive actions. 

Definition 4: The set of all currently possible sliding 
motions is M s = {a- G \ a_* : ||m s || Li = 1}, where 
m s = a! i — ai. 

Definition 5: The set of all currently 
possible corner motions is M c = 

{a- GZ d \ a-i : ||m c || Li = 2 A m c j G {0,1}}, where 
j G [1,..., d\ and m c = a[ — ai. 







Note that M s and M c in Def. [4] and Def. [5] are equally 
applicable to 2D and 3D. These definitions encode the 
motion model outlined in Section IlII-Al and allow us to define 
the restricted action set in two dimensions as follows. 

Definition 6: The two-dimensional restricted action set is 
given by R} D (a) = M s U M c - 

This definition ensures that agent i can only move to unoccu¬ 
pied neighboring grid positions a • through sliding or comer 
motions (or stay at its current position - see Algorithm 
[T] and Algorithm [2]). All other agents replay their current 
actions a_*. 

b) 3D reconfiguration: Whereas in the 2D case agents 
were allowed to move to all unoccupied neighboring grid 
cell regardless of connectivity constraints, in the three- 
dimensional case we introduce the requirement of ground¬ 
edness. An agent is immobile, if executing an action would 
remove groundedness from any of its neighbors. Grounded¬ 
ness requires a notion of ground plane, which is defined as 
follows. 

Definition 7 (Ground Plane): The ground plane is the set 
S GP — {$ £ £ : s z = 0} where £ C Z 3 and the 
corresponding connectivity graph is G gp = ( V GP ,E GP ) 
with eij G E gp if ||Si - Sj || = 1. 

Note that the ground plane is defined as the xy-plane and 
its connectivity graph G gp is, by definition, connected. 
Positions s G S GP are not allowed to be occupied by 
agents, therefore a* G Ai \ S GP Mi £ V. Using the graph 
G gp , we define G' = (V\E f ) as V' = V U V GP and 
E' = V' x V' such that G E' if for v^Vj G V' we 
have \\cti — aj\\ Li = 1. Note that G' represents the current 
configuration including the ground plane, and represents 
an action of an agent or an unoccupied position in the ground 
plane. 

Definition 8 (Groundedness): An agent i is grounded if 
there exists a path on G' from Vi G V C V' to some Vk G 
V GP C V f , where Vi represents agent i in the connectivity 
graph G (see Def.[3|. A configuration C is grounded if every 
agent i G V is grounded. 

The idea behind groundedness hints at an embedding of 
a self-reconfigurable system in the physical world, where 
agents cannot choose arbitrary positions in the environment 
(e.g. float in free space). More importantly, we use the 
notion of groundednes to prove completeness of determin¬ 
istic reconfiguration in Section [IV] and irreducibility of the 
underlying Markov chain in Section |Y[ 

An agent can verify groundedness in a computationally cheap 
way through a depth-first search, which is complete and 
guaranteed to terminate in time proportional to O(N) in a 
finite space. The notion of groundedness also informs the 
restricted action set computation. If all neighbors Af t = 
{vj G V : G E} (adjacency according to G in Def. 

[3]) of agent i can compute an alternate path to ground (other 
than through agent i) then agent i is allowed to move. To 
formalize this idea, let G-i = ( V-i,E-i ) with V-i = 
V U Vqp \ {vi} and E-i = V-i x V-i such that G E-i if 
for Vi , Vj G V-i we have ||a* — a j\\ Li = 1. G-i is therefore 
the connectivity graph of the current configuration including 
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Fig. 2: Examples of grounded configurations and feasible 
motions of cubes. 


the ground plane without agent i. Subsequently, R\ d (a) is 
defined as follows. 

Definition 9: The three-dimensional restricted action set 
R^ D (a) = M s U M c if all agents vj G A 4 are grounded on 
G-i. Otherwise, i?f D (a) = {ai}. 

This definition encodes the same criteria as the two- 
dimensional action set with the additional constraint of 
maintaining groundedness (see Fig. [2]). If agent i executing 
an action would leave any of its neighbors ungrounded, agent 
i is not allowed to move. 

IV. Deterministic Completeness 

In this section we establish completeness of deterministic 
reconfiguration in two and three dimensions. We will show 
that for any two configurations Cj and C p there exists a 
deterministically determined sequence of individual agent 
actions such that configuration C/ will be reconfigured into 
C p • These results are required to show irreducibility of the 
Markov chain induced by the learning algorithm outlined in 
Section |V| Irreducibility guarantees the existence of a unique 
stationary distribution and furthermore a unique potential 
function maximizer. We first show completeness of 2D 
reconfiguration. 

Theorem 1 (Completeness of 2D reconfiguration): 

Any given two-dimensional configuration Cj can be 
reconfigured into any other two-dimensional configuration 
Ct, i.e. there exists a finite sequence of configurations 
{Ci = Co,Ci,...,C m = C p } such that two consecutive 
configurations differ only in one individual agent motion. 

Proof: Without loss of generality, assume that Cj and 
C p do not overlap, i.e. for no q G C/ is it true that also 
Ci G C P - Additionally, assume that Cj and Cj are separated 
along one dimension k G {e x ,e y ,e z } (with e x ,e y ,e z being 
the basis vectors of the lattice), i.e. Vc^ G C/ we have 
that Ci : k < Cj,k 1 Vcj G Ct- Then at each time step t, 
select the agent i whose current position q G C is closest 
to an unoccupied position Cj G Ct- Plan a deterministic 










path of primitive agent motions pi = {q = c 3 -A c\ -A 
• • • -A c™ = c ti } using a complete path planner such as A*. 
Note that such a path always exists since we don’t require 
agents to remain connected to any other agents. Therefore, 
the path planning problem is reduced to single agent path 
planning on a discrete finite grid, which is complete because 
A* is complete. This greedy selection process of the agent- 
target pairs together with a complete path planning approach 
suffices to reconfigure any two-dimensional configuration 
into any other two-dimensional configuration (similar to 
flood-fill algorithms). ■ 

The result in Theorem [l] holds for any configuration, even 
configurations that consist of multiple connected compo¬ 
nents. Before we can show a similar result for the 3D case 
we need to introduce a graph theoretic result. 

Lemma 1: According to Lemma 6 in [19], any finite graph 
with at least two vertices contains at least two vertices which 
are not articulation points]^] 

Theorem 2 (Completeness of 3D to 2D reconfiguration): 
Any finite grounded 3D configuration C G,3D can 
be reconfigured into a 2D configuration Cj p t , i.e. 
there exists a finite sequence of configurations 
{C G,3D = Co, Ci,..., Cm = C™} such that two consecutive 
configurations differ only in one individual agent motion. 

Proof: Without loss of generality, assume that the 
connectivity graph of C G,3D consists of one connected 
component. In any finite grounded 3D configuration, there 
always exists an agent i G V with a non-empty restricted 
action set \Ri(a)\ > 0.. Agent i is therefore mobile and there 
exists a finite path of individual agent motions pi = {a* = 
a-*, a}, ..., a™} such that for some s G Sgp ||uf\ s\\ L = 1, 
i.e. the agent’s final action is a position on the ground plane 
Sgp- 

Let the subset of agents V z> i C V contain those agents 
whose positions are not on the ground plane and G z> \ = 
(V z> i, E z> \) the corresponding connectivity graph. Further¬ 
more, let G" = ( V ", E") be such that V" = V z>1 U { v GP }, 
where v GP is a single node representing all the agents 
on the ground plane and E" such that G E" if for 
Vi,Vj G V" we have ||ai — aj\\ L = 1. According to Def. |9| 
an agent i is mobile if it is not an articulation point in the 
connectivity graph G" (see Fig. [ 3 }. In G", v GP may or may 
not be a non-articulation points, but according to Lemma 
[I] in every connected graph there always exist at least two 
non-articulation points. Therefore, there exists at least one 
agent i that has a non-empty restricted action set. For that 
agent, one can compute a deterministic action sequence that 
moves agent i to the ground plane. In other words, at each 
iteration t we transfer one vertex from V z> \ to v GP such 
that \V* >X \ = | Vlf \| — 1 until \V z> i\ = 0 and all agents 
have been moved to v GP . This process terminates in a finite 
number of time steps because the initial configuration C G,3D 
is finite. The result is a 2D configuration C] p t representing 
G". m 


2 An articulation point is a vertex in a graph whose removal would 
disconnect the graph. 



Sgp 



G" 


Fig. 3: Example of a grounded configuration C G , the ground 
plane Sgp , associated connectivity graph G. G z> \ repre¬ 
sents all agents not on the ground plane, while all agents on 
the ground plane are represented by a single node in G" 


G 3D 

Corollary 1: Any finite grounded 3D configuration C T ’ 
can be reconfigured into any other finite grounded 3D 
configuration C^’ 3jD . 

Proof: Since, according to Theorem [2| any finite 

grounded 3D configuration Cf' 3D can be reduced to an in¬ 
termediate 2D configuration Cj p t in a finite number of steps, 
the reverse is also true - any finite grounded 3D configuration 
C^ 3D can be assembled from some 2D configuration in 
a finite number of steps. According to Theorem [T] any 2D 
configuration Cj P t can be reconfigured into any other 2D 
configuration C‘jR . Therefore, there exists a deterministic 
finite action sequence from Cj to Crp ■ 

V. Stochastic Reconfiguration 

In this and the following section we present a stochastic 
reconfiguration algorithm that is fully distributed, does not 
require any precomputation of paths or actions, and can 
adapt to changing environment conditions. Unlike log-linear 
learning ([2]), which cannot handle restricted action sets, 
and variants such as binary log-linear learning ([1], [13], 
[14]), which can only handle action sets constrained by 
an agent’s own previous action, the presented algorithm 
guarantees convergence to the potential function maximizer 
even if action sets are constrained by all agents’ actions. 

Our algorithm is based on the Metropolis-Hastings al¬ 
gorithm ([15],[10]), which allows the design of transition 
probabilities such that the stationary distribution of the un¬ 
derlying Markov chain is a desired target distribution, which 
we choose to be the Gibbs distribution. This choice enables a 
distributed implementation of the learning rule in Theorem 
[3] through the potential game formalism (see Corollary [2]). 
The Metropolis-Hastings algorithm guarantees two results: 
existence and uniqueness of a stationary distribution. We 
will use these properties to show that the only stochastically 
stable state is x*, the potential function maximizer. 

Theorem 3: Given any two states x % and x 3 representing 
global configurations, the transition probabilities 

(if < 1 

PH = 31 

[Qij 


O.W. 


























guarantee that the unique stationary distribution of the un¬ 
derlying Markov chain is a Gibbs distribution of the form 

Pr[X = x] = - 

Proof: Let a be a finite state space containing all 
possible states of configurations composed of N agents]^] 

On that state space, let the desired target distribution be 

!<£(» 

7 t(x) = Pr[X = x] = -—— x , with defined in 

Y^x'ex e T 

Def. |2| By applying the Metropolis-Hastings algorithm, we 
can compute transition probabilities P = {pij} such that i r 
is the stationary distribution of P, i.e. i r = i rP. 

In the Metropolis-Hastings algorithm, a transition proba¬ 
bility is represented as = g(xi -A Xj)a(xi -A Xj), where 
g{pci -A Xj ) is the proposal distribution and a{pCi -A Xj) is 
the acceptance distribution. Both are conditional probabilities 
of proposing/accepting a state Xj given that the current state 
is xi. 


Let agent k achieve the transition from state Xi to Xj 
through action a k G 7^ (a). Then one possible choice for 
the proposal distribution is g(xi -A Xj) = qij = , Vj G 

{1,..., |P/c|}, i-e. a random choice among all available 
actions of agent k. According to Hastings ([10]), a popular 
choice for the acceptance distribution is the Metropolis 
choice otij = min j 1, j. Note that unlike in the original 
formulation ([15]), we don’t assume that gyy = qp (see Fig. 
[4] for an illustration of qij and qjf). These choices result in 
the following transition probabilities: 


Pij = 



if < i 


TfiQij 

o.w. 


It is easily verified that these p^ satisfy the detailed bal¬ 
ance equation HiPij = ^jPji and thus guarantee the existence 

of a stationary distribution (see [15] and [10]). The resulting 

— 

Pij follow from the definition of 7 Ty = - —— x - and 

eT 

similarly ttj. 

Uniqueness of the stationary distribution follows from the 
irreducibility of the Markov chain induced by P = {p^}, 
which is the case for our choice of proposal and acceptance 
distribution because they assign a nonzero probability to 
every action in any restricted action set. By Theorem [2] and 
Corollary [I] we know that any state x 3 can be reached from 
any other state X{ and vice versa. Thus any action path has 
nonzero probability and irreducibility follows. ■ 

Theorem [3] applies equally for 2D and 3D configuration. 
However, for 3D reconfiguration, the proof relies implicitly 
on the notion of groundedness to show irreducibility of the 
underlying Markov chain (through the computation of Pf D ). 
The following theorem requires the definition of stochastic 
stability. 

Definition 10 (Stochastic Stability [20]): A state Xi G X 
is stochastically stable relative to a Markov process P e 
if the following holds for the stationary distribution i r 

lim e _^o ttJc . > 0. 


3 Such a space is finite if we assume translation and rotation invariance 
of the configuration as well as a finite environment. 
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a 3 










ai 






&2 










CLi 









\n k \ = 2 \n k ’\ = i |PH = 3 

„ _ 1 _ 1 „ _ 1 _ 1 n — 1 — I 

~ \n k \ — 2 VP — \n k ,\ ~ — \n k f, | — 3 


Fig. 4: Example of forward and reverse actions with their 
associated proposal probabilities q^, qji, and qk. Note that 
x^ Xj , x'j are states of the entire configuration, and agent k 
is the currently active agent. 


Note that the Markov process is defined through the transi¬ 
tion probabilities in Theorem [3] and the stationary distribution 
is a Gibbs distribution. Furthermore e is equivalent to the 
learning rate r in the following proof. 

Theorem 4: Consider the self-reconfiguration problem in 
Def. [2] If all players adhere to the learning rule in Theorem 
[3] then the unique stochastically stable state x* is the state 
that maximizes the global potential function. 

Proof: Note that this result holds by definition of 
the desired target distribution, which is a Gibbs distribution 
centered at the state of maximum global potential and defines 
the probability of being in a state x as Pr[X = x\ = 

— $>(x) 

-^—r^-yy. The learning rate or temperature r represents 

the willingness of an agent to make a suboptimal choice (i.e. 
explore the state space). As r —>> 0, Pr[X = x\ —> 0 for all 
states x 7 ^ x* which are not potential function maximizers 
(see [2] and [14]). ■ 

Note that the maximum global potential is achieved when 
all agents are at a target position ay G Ct . Algorithm [T] 
shows an implementation of Theorem [3] In Algorithm [lj 
similarly to the Metropolis-Hastings algorithm in [10], pa = 
1 — ^j^iPij’ 

VI. A DECENTRALIZED ALGORITHM 


One shortcoming of Algorithm [T] is its centralized nature 
that requires the computation of a global potential function 
4>(xy) and depends on the entire current configuration xy. 
A decentralized algorithm is desirable to execute the global 
learning rule on a team of agents. The formulation of the 
self-reconfiguration problem as a potential game allows us to 
rewrite the transition probabilities in a decentralized fashion 
as follows. 

Corollary 2: The global learning rule of Theorem [3] can 
be decentralized such that each agent can execute it with 
local information only. 

Proof: Note that for agent k, we can express Xj and 
Xi as (i a' k ,a - k ) and (a k ,a- k ) respectively. According to 










































input : Current and target configuration C and Ct 
while True do 

Randomly pick an agent k in state Xi 
Compute restricted action set TZ k 
Select £ 7 Z k with probability qij = 

Compute OLij = min |l, j 


if OLij = 1 then 
i xt+i 
else 


x t +x = 


with probability olij 
with probability 1 — 


end 


end 


Algorithm 1: Global game-theoretic learning algorithm. 
Note that state Xj is the result of agent k applying 
action a^j ik and Xi and Xj refer to states of the entire 
configuration. Also note that qa ^ TZ k but pa ^ 0. 


input : Target configuration C T 
Start clock (see [3]) 
while True do 

if Clock ticks then 

Compute current restricted action set TZk 
Select a' G TZk with probability q = y^ 

Compute a = min jl, e H u ( a ')~ u ( a )) j 

if a = 1 then 


Xt- i-i — & 


else 


x t +i = 


with probability a 
with probability 1 — a 


end 


end 


end 


Algorithm 2: Self-reconfiguration using game-theoretic 
learning local version that each agent executes. Note that 
x t , x t +i refer to consecutive states of the active agent. 


Proposition [T] we can then rewrite $(xj) — $(xi) as follows. 

Ukip'k) ^7/c(^/c) = Uk {o>k 5 CL—k) Uk (dk •> k) 

= $(4, a-k ) - $(a fe , a-k ) 

= H x j) - $ (*i) 

Therefore, we can rewrite the transition probabilities as 
follows. 

q.. e H U k( a 'k)- U k(cLk)) if e H U k( a 'k)~ U ( a k)) 9d± < l 

3 Qij 

qij o.w. 

Since qij, qji , Uk{a' k ), as well as U k {a k ) can be computed 
with local information, so too can the transition probabilities. 
The stationary distribution of the process described by pij is 
the same Gibbs distribution as in Theorem [3j ■ 

Note that local can mean multiple hops, because the 
computation of restricted action sets requires to maintain 
groundedness of all neighboring agents. Verifying ground¬ 
edness requires a DFS search to the ground plane, which in 
the worst case can take N — 1 hops. 

Algorithm [2] shows a decentralized implementation of 
Algorithm [I] and Corollary [2] Note that a transition from 
configuration Xi to Xj is accomplished by agent k executing 
action a' G TZ k starting at its current location a. Therefore, 
we can interpret q^ as a transition probability for a forward 
action and qji as a reverse action (see Fig. [4]). 

VII. Implementation 

The global version of the algorithm was implemented 
and evaluated in Matlab. For the simulations shown in Fig. 
[5] we used a r = 0.001 that struck a balance between 
greedy maximization of agent utilities and exploration of the 
state space through suboptimal actions. Agents’ actions are 
restricted according to the action set computation outlined 
in Section IIII-CI and the motion model in Section IIII-AI 
Restricted action sets depend on the agents joint action and 
the environment - in our simulations only the ground plane. 


Agents’ positions are initialized above the ground plane such 
that their z-coordinate 2 > 1. In a straightforward extension 
to this algorithm, obstacles can be added to restrict the 
environment even further. 

Fig. [5] shows convergence results of Algorithm [T] of 
configurations containing 10, 20, and 30 agents. Four types 
of reconfigurations have been performed: 2D to 2D, 2D to 
3D, 3D to 2D, and 3D to 3D. The vertical lines in Fig. 
[5] represent the average time to convergence of all four 
types of reconfigurations of a certain size (e.g. the leftmost 
line represents average convergence of a configuration of 10 
agents). Convergence is achieved, if the configuration reaches 
a global potential of <f> = N, i.e. every agent has a utility 
of Ui = 1. Note that in the scenarios of Fig. [5j the target 
configuration was offset from the initial configuration by a 
translation of 10 units along the x-axis. One can observe that 
at the beginning of each reconfiguration the global potential 
ramps up very fast (within a few hundred time steps), but the 
asymptotic convergence to the global optimum can be slow 
(see the case 3D to 2D for 30 agents). An example of a 2D 
to 2D reconfiguration sequence is shown in Fig. [T] 

VIII. Conclusion 

In this paper, we have applied the potential game for¬ 
malism to the self-reconfiguration problem and developed a 
stochastic algorithm that converges to the global potential 
function maximizer. Under the assumptions of grounded¬ 
ness, we have introduced a decentralized approach to self¬ 
reconfiguration that requires little communication, does not 
rely on a centralized decision maker or precomputation, 
and converges to the global optimum. Our simulation re¬ 
sults suggest that this formulation of the self-reconfiguration 
problem is indeed a feasible approach and can solve self¬ 
reconfiguration in a decentralized fashion. 














Time to convergence using r =0.001 



Time steps 

Fig. 5: Convergence times for different types of configurations and sizes ranging from 10 to 30 agents. 
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